Search CORE

28 research outputs found

A Joint Model for Unsupervised Chinese Word Segmentation

Author: Chang Baobao
Chen Miaohong
Pei Wenzhe
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2014
Field of study

In this paper, we propose a joint model for unsupervised Chinese word segmentation (CWS). Inspired by the 'products of experts' idea, our joint model firstly combines two generative models, which are word-based hierarchical Dirichlet process model and character-based hidden Markov model, by simply multiplying their probabilities together. Gibbs sampling is used for model inference. In order to further combine the strength of goodness-based model, we then integrated nVBE into our joint model by using it to initializing the Gibbs sampler. We conduct our experiments on PKU and MSRA datasets provided by the second SIGHAN bakeoff. Test results on these two datasets show that the joint model achieves much better results than all of its component models. Statistical significance tests also show that it is significantly better than stateof- The-art systems, achieving the highest F-scores. Finally, analysis indicates that compared with nVBE and HDP, the joint model has a stronger ability to solve both combinational and overlapping ambiguities in Chinese word segmentation.,. ? 2014 Association for Computational Linguistics.EI

Crossref

Recommended from our members

Synergistic and Antagonistic Drug Combinations Depend on Network Topology

Author: Lai Luhua
Ma Wenzhe
Ouyang Qi
Pei Jianfeng
Tang Chao
Yin Ning
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2014
Field of study

Drug combinations may exhibit synergistic or antagonistic effects. Rational design of synergistic drug combinations remains a challenge despite active experimental and computational efforts. Because drugs manifest their action via their targets, the effects of drug combinations should depend on the interaction of their targets in a network manner. We therefore modeled the effects of drug combinations along with their targets interacting in a network, trying to elucidate the relationships between the network topology involving drug targets and drug combination effects. We used three-node enzymatic networks with various topologies and parameters to study two-drug combinations. These networks can be simplifications of more complex networks involving drug targets, or closely connected target networks themselves. We found that the effects of most of the combinations were not sensitive to parameter variation, indicating that drug combinational effects largely depend on network topology. We then identified and analyzed consistent synergistic or antagonistic drug combination motifs. Synergistic motifs encompass a diverse range of patterns, including both serial and parallel combinations, while antagonistic combinations are relatively less common and homogenous, mostly composed of a positive feedback loop and a downstream link. Overall our study indicated that designing novel synergistic drug combinations based on network topology could be promising, and the motifs we identified could be a useful catalog for rational drug combination design in enzymatic systems

Harvard University - DASH

Directory of Open Access Journals

PubMed Central

FigShare

Neural Chinese Word Segmentation with Lexicon and Unlabeled Data via Posterior Regularization

Author: Cai Deng
Chen Xinchi
Chen Xinchi
Dauphin Yann
Ganchev Kuzman
Lafferty John D.
Levow Gina-Anne
Liu Junxin
Liu Yijia
Low Jin Kiat
Luo Wencan
Pei Wenzhe
Sun Weiwei
Xu Jingjing
Xue Nianwen
Yang Jie
Zhang Jiacheng
Zhang Meishan
Zhang Qi
Zhang Yanna
Zhao Hai
Zhao Hai
Zhao Lujun
Zheng Xiaoqing
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 26/04/2019
Field of study

Existing methods for CWS usually rely on a large number of labeled sentences to train word segmentation models, which are expensive and time-consuming to annotate. Luckily, the unlabeled data is usually easy to collect and many high-quality Chinese lexicons are off-the-shelf, both of which can provide useful information for CWS. In this paper, we propose a neural approach for Chinese word segmentation which can exploit both lexicon and unlabeled data. Our approach is based on a variant of posterior regularization algorithm, and the unlabeled data and lexicon are incorporated into model training as indirect supervision by regularizing the prediction space of CWS models. Extensive experiments on multiple benchmark datasets in both in-domain and cross-domain scenarios validate the effectiveness of our approach.Comment: 7 pages, 11 figures, accepted by the 2019 World Wide Web Conference (WWW '19

arXiv.org e-Print Archive

Crossref

An Effective Neural Network Model for Graph-based Dependency Parsing

Author: Chang Baobao
Ge Tao
Pei Wenzhe
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2015
Field of study

Most existing graph-based parsing models rely on millions of hand-crafted features, which limits their generalization ability and slows down the parsing speed. In this paper, we propose a general and effective Neural Network model for graph-based dependency parsing. Our model can automatically learn high-order feature combinations using only atomic features by exploiting a novel activation function tanhcube. Moreover, we propose a simple yet effective way to utilize phrase-level information that is expensive to use in conventional graph-based parsers. Experiments on the English Penn Treebank show that parsers based on our model perform better than conventional graph-based parsers. ? 2015 Association for Computational Linguistics.EI313-322

Crossref

Max-margin tensor neural network for chinese word segmentation

Author: Baobao Chang
Tao Ge
Wenzhe Pei
Publication venue
Publication date: 01/01/2014
Field of study

Abstract Recently, neural network models for natural language processing tasks have been increasingly focused on for their ability to alleviate the burden of manual feature engineering. In this paper, we propose a novel neural network model for Chinese word segmentation called Max-Margin Tensor Neural Network (MMTNN). By exploiting tag embeddings and tensorbased transformation, MMTNN has the ability to model complicated interactions between tags and context characters. Furthermore, a new tensor factorization approach is proposed to speed up the model and avoid overfitting. Experiments on the benchmark dataset show that our model achieves better performances than previous neural network models and that our model can achieve a competitive performance with minimal feature engineering. Despite Chinese word segmentation being a specific case, MMTNN can be easily generalized and applied to other sequence labeling tasks

CiteSeerX

Max-Margin Tensor Neural Network for Chinese Word Segmentation

Author: Chang Baobao
Ge Tao
Pei Wenzhe
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2014
Field of study

Recently, neural network models for natural language processing tasks have been increasingly focused on for their ability to alleviate the burden of manual feature engineering. In this paper, we propose a novel neural network model for Chinese word segmentation called Max-Margin Tensor Neural Network (MMTNN). By exploiting tag embeddings and tensorbased transformation, MMTNN has the ability to model complicated interactions between tags and context characters. Furthermore, a new tensor factorization approach is proposed to speed up the model and avoid overfitting. Experiments on the benchmark dataset show that our model achieves better performances than previous neural network models and that our model can achieve a competitive performance with minimal feature engineering. Despite Chinese word segmentation being a specific case, MMTNN can be easily generalized and applied to other sequence labeling tasks. ? 2014 Association for Computational Linguistics.EI

Crossref

Characteristics Analysis of an Electromagnetic Actuator for Magnetic Levitation Transportation

Author: Chuan Zhao
Fangchao Xu
Feng Sun
Junjie Jin
Wenzhe Pei
Xin Wang
Yuhang Liu
Publication venue: MDPI AG
Publication date: 01/12/2022
Field of study

In this article, an electromagnetic actuator is proposed to improve the driving performance of magnetic levitation transportation applied to ultra-clean manufacturing. The electromagnetic actuator mainly includes the stator with the Halbach array and the mover with a symmetrical structure. First, the actuator principle and structure are illustrated. Afterward, in order to select a suitable secondary structure and analyze the characteristics of the actuator, the electromagnetic characteristics of actuators with different secondary structures are analyzed by the finite element method (FEM). Analysis results show that the actuator adopting the secondary structure with a Halbach array will increase the electromagnetic force and working stability. The secondary with the three-section Halbach array is selected for the electromagnetic actuator. Then, the influence of secondary permanent magnet (PM) thickness on the electromagnetic force is analyzed by FEM. The results indicate that the increase in PM thickness will lead to increased electromagnetic force and decreased utilization ratio of PM. Finally, a prototype of an electromagnetic actuator is built, and experiments are implemented. The correctness of the theoretical analysis and the effectiveness of the electromagnetic actuator are verified by experimental results

Directory of Open Access Journals

Carbon isotope and origin of the hydrocarbon gases in the Junggar Basin, China

Author: Baoli Xiang
Chuanzhen Zhu
Lixin Pei
Wenjun He
Wenzhe Gang
Yan Dong
Yazhou Liu
Publication venue: 'Elsevier BV'
Publication date: 01/10/2018
Field of study

The genetic type, source and distribution of hydrocarbon gases in the Junggar Basin were clarified through the carbon isotope analysis. Mature to post mature oil-type gas, mature to post mature coal-type gas, transition gas and biogas are identified in the Junggar Basin. Partly reversed order of carbon isotope of hydrocarbon gases in the Junggar Basin are attributed to one or several of the following reasons: mixing of oil-type and coal-type gases, mixing of coal-type gases of different source, mixing of coal-type gases of varied maturity, and microbial action. Three types of coal-type gases in the Junggar Basin are identified. The first type of coal-type gases characterized with high δ13C values of heavy hydrocarbon gases (δ13C2>−26.0‰) are the mature to high mature gases that are generated from Jurassic source rocks. The second type of coal-type gases characterized with low δ13C values of heavy hydrocarbon gases (δ13C2<−26.0‰) and wide maturity range, are generated from one or several source rocks in the Jurassic and the Wuerhe and Jiamuhe Formations of Permian. The third type of coal-type gases characterized with a wide δ13C value of heavy hydrocarbon gases and the high-post maturity are generated from the Carboniferous source rocks. Keywords: Carbon isotope, Hydrocarbon gas, Gas-source correlation, Junggar Basi

Directory of Open Access Journals

List of antagonistic combinations found in our study.

Author: Chao Tang (10925)
Jianfeng Pei (187818)
Luhua Lai (36732)
Ning Yin (460850)
Qi Ouyang (73365)
Wenzhe Ma (221319)
Publication venue
Publication date
Field of study

<p>List of antagonistic combinations found in our study.</p

FigShare

Distribution of percentage of synergistic cases under various parameter sets for all combinations studied.

Author: Chao Tang (10925)
Jianfeng Pei (187818)
Luhua Lai (36732)
Ning Yin (460850)
Qi Ouyang (73365)
Wenzhe Ma (221319)
Publication venue
Publication date
Field of study

<p>Consistently synergistic and antagonistic combinations are marked, showing their stark contrast in number.</p

FigShare